尽管像主组件分析一样,经典缩放是无参数的,但大多数用于嵌入多元数据的方法都需要选择一个或几个参数。由于情况的无监督性,这种调整可能很困难。我们提出了一种简单,几乎明显的方法来监督调整参数的选择:最大程度地减少压力的概念。我们通过参考刚性理论来证实这种选择。我们扩展了Aspnes等人的结果。 (IEEE移动计算,2006年),表明一般的随机几何图形是具有很高概率的三材料图。我们提供了稳定结果\ a la anderson等。 (SIAM离散数学,2010年)。我们在Shang和Ruml的MDS-MAP(P)算法的背景下说明了这种方法(IEEE Infocom,2004)。作为一种典型的补丁方法,它需要选择补丁大小,我们使用压力来使该选择数据驱动。在这种情况下,我们执行许多实验来说明使用应力作为调整参数选择的基础的有效性。这样一来,我们揭示了一个偏见差异的权衡,这是一种现象,在多维缩放文献中可能被忽略了。通过将MDS-MAP(P)变成一种流形学习方法,我们获得了ISOMAP的局部版本,为此,应力最小化也可以用于参数调整。
translated by 谷歌翻译
This paper presents a methodology for integrating machine learning techniques into metaheuristics for solving combinatorial optimization problems. Namely, we propose a general machine learning framework for neighbor generation in metaheuristic search. We first define an efficient neighborhood structure constructed by applying a transformation to a selected subset of variables from the current solution. Then, the key of the proposed methodology is to generate promising neighbors by selecting a proper subset of variables that contains a descent of the objective in the solution space. To learn a good variable selection strategy, we formulate the problem as a classification task that exploits structural information from the characteristics of the problem and from high-quality solutions. We validate our methodology on two metaheuristic applications: a Tabu Search scheme for solving a Wireless Network Optimization problem and a Large Neighborhood Search heuristic for solving Mixed-Integer Programs. The experimental results show that our approach is able to achieve a satisfactory trade-off between the exploration of a larger solution space and the exploitation of high-quality solution regions on both applications.
translated by 谷歌翻译
We investigate the asymptotic properties of deep Residual networks (ResNets) as the number of layers increases. We first show the existence of scaling regimes for trained weights markedly different from those implicitly assumed in the neural ODE literature. We study the convergence of the hidden state dynamics in these scaling regimes, showing that one may obtain an ODE, a stochastic differential equation (SDE) or neither of these. In particular, our findings point to the existence of a diffusive regime in which the deep network limit is described by a class of stochastic differential equations (SDEs). Finally, we derive the corresponding scaling limits for the backpropagation dynamics.
translated by 谷歌翻译
Video understanding is a growing field and a subject of intense research, which includes many interesting tasks to understanding both spatial and temporal information, e.g., action detection, action recognition, video captioning, video retrieval. One of the most challenging problems in video understanding is dealing with feature extraction, i.e. extract contextual visual representation from given untrimmed video due to the long and complicated temporal structure of unconstrained videos. Different from existing approaches, which apply a pre-trained backbone network as a black-box to extract visual representation, our approach aims to extract the most contextual information with an explainable mechanism. As we observed, humans typically perceive a video through the interactions between three main factors, i.e., the actors, the relevant objects, and the surrounding environment. Therefore, it is very crucial to design a contextual explainable video representation extraction that can capture each of such factors and model the relationships between them. In this paper, we discuss approaches, that incorporate the human perception process into modeling actors, objects, and the environment. We choose video paragraph captioning and temporal action detection to illustrate the effectiveness of human perception based-contextual representation in video understanding. Source code is publicly available at https://github.com/UARK-AICV/Video_Representation.
translated by 谷歌翻译
Warning: this paper contains content that may be offensive or upsetting. Considering the large amount of content created online by the minute, slang-aware automatic tools are critically needed to promote social good, and assist policymakers and moderators in restricting the spread of offensive language, abuse, and hate speech. Despite the success of large language models and the spontaneous emergence of slang dictionaries, it is unclear how far their combination goes in terms of slang understanding for downstream social good tasks. In this paper, we provide a framework to study different combinations of representation learning models and knowledge resources for a variety of downstream tasks that rely on slang understanding. Our experiments show the superiority of models that have been pre-trained on social media data, while the impact of dictionaries is positive only for static word embeddings. Our error analysis identifies core challenges for slang representation learning, including out-of-vocabulary words, polysemy, variance, and annotation disagreements, which can be traced to characteristics of slang as a quickly evolving and highly subjective language.
translated by 谷歌翻译
Warning: this paper contains content that may be offensive or upsetting. In the current context where online platforms have been effectively weaponized in a variety of geo-political events and social issues, Internet memes make fair content moderation at scale even more difficult. Existing work on meme classification and tracking has focused on black-box methods that do not explicitly consider the semantics of the memes or the context of their creation. In this paper, we pursue a modular and explainable architecture for Internet meme understanding. We design and implement multimodal classification methods that perform example- and prototype-based reasoning over training cases, while leveraging both textual and visual SOTA models to represent the individual cases. We study the relevance of our modular and explainable models in detecting harmful memes on two existing tasks: Hate Speech Detection and Misogyny Classification. We compare the performance between example- and prototype-based methods, and between text, vision, and multimodal models, across different categories of harmfulness (e.g., stereotype and objectification). We devise a user-friendly interface that facilitates the comparative analysis of examples retrieved by all of our models for any given meme, informing the community about the strengths and limitations of these explainable methods.
translated by 谷歌翻译
To meet the fairly high safety and reliability requirements in practice, the state of health (SOH) estimation of Lithium-ion batteries (LIBs), which has a close relationship with the degradation performance, has been extensively studied with the widespread applications of various electronics. The conventional SOH estimation approaches with digital twin are end-of-cycle estimation that require the completion of a full charge/discharge cycle to observe the maximum available capacity. However, under dynamic operating conditions with partially discharged data, it is impossible to sense accurate real-time SOH estimation for LIBs. To bridge this research gap, we put forward a digital twin framework to gain the capability of sensing the battery's SOH on the fly, updating the physical battery model. The proposed digital twin solution consists of three core components to enable real-time SOH estimation without requiring a complete discharge. First, to handle the variable training cycling data, the energy discrepancy-aware cycling synchronization is proposed to align cycling data with guaranteeing the same data structure. Second, to explore the temporal importance of different training sampling times, a time-attention SOH estimation model is developed with data encoding to capture the degradation behavior over cycles, excluding adverse influences of unimportant samples. Finally, for online implementation, a similarity analysis-based data reconstruction has been put forward to provide real-time SOH estimation without requiring a full discharge cycle. Through a series of results conducted on a widely used benchmark, the proposed method yields the real-time SOH estimation with errors less than 1% for most sampling times in ongoing cycles.
translated by 谷歌翻译
Assessing the critical view of safety in laparoscopic cholecystectomy requires accurate identification and localization of key anatomical structures, reasoning about their geometric relationships to one another, and determining the quality of their exposure. In this work, we propose to capture each of these aspects by modeling the surgical scene with a disentangled latent scene graph representation, which we can then process using a graph neural network. Unlike previous approaches using graph representations, we explicitly encode in our graphs semantic information such as object locations and shapes, class probabilities and visual features. We also incorporate an auxiliary image reconstruction objective to help train the latent graph representations. We demonstrate the value of these components through comprehensive ablation studies and achieve state-of-the-art results for critical view of safety prediction across multiple experimental settings.
translated by 谷歌翻译
The introduction of high-quality image generation models, particularly the StyleGAN family, provides a powerful tool to synthesize and manipulate images. However, existing models are built upon high-quality (HQ) data as desired outputs, making them unfit for in-the-wild low-quality (LQ) images, which are common inputs for manipulation. In this work, we bridge this gap by proposing a novel GAN structure that allows for generating images with controllable quality. The network can synthesize various image degradation and restore the sharp image via a quality control code. Our proposed QC-StyleGAN can directly edit LQ images without altering their quality by applying GAN inversion and manipulation techniques. It also provides for free an image restoration solution that can handle various degradations, including noise, blur, compression artifacts, and their mixtures. Finally, we demonstrate numerous other applications such as image degradation synthesis, transfer, and interpolation.
translated by 谷歌翻译
The proliferation of smartphones has accelerated mobility studies by largely increasing the type and volume of mobility data available. One such source of mobility data is from GPS technology, which is becoming increasingly common and helps the research community understand mobility patterns of people. However, there lacks a standardized framework for studying the different mobility patterns created by the non-Work, non-Home locations of Working and Nonworking users on Workdays and Offdays using machine learning methods. We propose a new mobility metric, Daily Characteristic Distance, and use it to generate features for each user together with Origin-Destination matrix features. We then use those features with an unsupervised machine learning method, $k$-means clustering, and obtain three clusters of users for each type of day (Workday and Offday). Finally, we propose two new metrics for the analysis of the clustering results, namely User Commonality and Average Frequency. By using the proposed metrics, interesting user behaviors can be discerned and it helps us to better understand the mobility patterns of the users.
translated by 谷歌翻译